Practical - Weeks 4 and 5
(Department of Spatial Sciences)
2024-10-21
A brief introduction to Species Distribution Modelling (SDM)
Tool that aims to predict where species could potentially be located from a limited set of observations.
It can also used to estimate a species’ niche from its distribution.
It’s a huge and popular field, mainly used in quantitative ecology and conservation.
There are multiple software packages and tutorials, facilitating its implementation.
Examples: sdm, dismo, usdm, ecospat, biomod2, etc.
It’s also a recent field, that quickly changes and advances, and it’s full of problems and challenges.
What data do SDMs require?
How do SDMs work?
SDMs relate the biodiversity observations to the environmental data using a variety of algorithms.
Once this relationship has been modelled, they can predict:
Future distributions
Areas where the species might be
Areas suitable for reintroduction
Distribution of invasive species
Different SDM algorithms need different types of occurrence data. These are the three main approaches:
Ensemble methods:
If we perform many models, which one do we choose?
We could choose the one that’s best suited to our data, or that performs the best.
However, a more popular approach is performing ensemble models.
In this approach, predictions from multiple models are combined or averaged to produce a single model.
The most frequently used package is biomod2.
ODMAP protocol (Zurell et al. 2020)
Which are our research objectives?
Which taxa are we working with? Where is our location? And our scale?
What data is available?
Obtain:
Biodiversity data (generally point observations)
Environmental data (generally rasters)
What kind of biodiversity data do we have?
Ensure the temporal and spatial scale of the biodiversity and spatial data match
Clean biodiversity data of unreliable observations
Generate pseudo-absences (if necessary)
Remove collinear environmental variables
12 variables from the 19 input variables have collinearity problem:
bio_16 bio_17 bio_19 bio_18 bio_6 bio_4 bio_1 bio_12 bio_10 bio_7 bio_5 bio_15
After excluding the collinear variables, the linear correlation coefficients ranges between:
min correlation ( bio_11 ~ bio_2 ): -0.01584258
max correlation ( bio_13 ~ bio_2 ): -0.6067526
---------- VIFs of the remained variables --------
Variables VIF
1 bio_2 3.545935
2 bio_3 2.612967
3 bio_8 2.822370
4 bio_9 2.294625
5 bio_11 6.979614
6 bio_13 2.538466
7 bio_14 3.175843
Remove collinear environmental variables
Spatial thinning - Keep only one presence / absence per environmental raster cell
Separate data into training (modelling) and testing
$initial
[1] 2842
$kept
[1] 2443
$out
[1] 399
Model selection:
Single model? Which?
Ensemble? How to average over models?
Which model settings should we use? % testing vs % training, number of cross validation folds (separations), number of repetitions per fold (robustness vs. computational time)
If we want to produce binary predictions, which threshold should we use?
It all depends on our data and our objectives
Exploration of response curves
Assessment of model coefficients and variable importance
Do our results make sense?
Model performance metrics:
AUC: area under the receiver operating characteristic curve (closer to 1 better, but beware! very high values might indicate overfitting)
TSS: true skill statistic
Sensitivity: true positive rate
Specificity: true negative rate
Map the potential distribution obtained from the modelling phase.
Into different space and time.
Underlying assumptions:
Species are at equilibrium with the environment
Species and environment are well sampled
We are considering all primary factors determining species distributions
The observation process can also bias our results
Introduction to species distribution modelling (SDM) in R by Damaris Zurell
Species distribution modelling practicals (Macroecology and global change course) by Damaris Zurell
ENM2020: A Free Online Course and Set of Resources on Modeling Species’ Niches and Distributions led by Town Peterson (Youtube playlist, Schedule and PDFs, Publication)
Best Practices in Spacies Distribution Modeling: A workshop in R by Adam Smith
Species distribution modeling in R by Robert Hijmans and Jane Elith
A very brief introduction to species distribution models in R by Jeff Oliver
SDM course by Bob Muscarella (very useful for finding other resources!)